Goto

Collaborating Authors

 digital painting


BitsFusion: 1.99 bits Weight Quantization of Diffusion Model Y ang Sui 1,2, Y anyu Li

Neural Information Processing Systems

Diffusion-based image generation models have achieved great success in recent years by showing the capability of synthesizing high-quality content. However, these models contain a huge number of parameters, resulting in a significantly large model size. Saving and transferring them is a major bottleneck for various applications, especially those running on resource-constrained devices.



Dynamic Prompt Optimizing for Text-to-Image Generation

Mo, Wenyi, Zhang, Tianyu, Bai, Yalong, Su, Bing, Wen, Ji-Rong, Yang, Qing

arXiv.org Artificial Intelligence

Text-to-image generative models, specifically those based on diffusion models like Imagen and Stable Diffusion, have made substantial advancements. Recently, there has been a surge of interest in the delicate refinement of text prompts. Users assign weights or alter the injection time steps of certain words in the text prompts to improve the quality of generated images. However, the success of fine-control prompts depends on the accuracy of the text prompts and the careful selection of weights and time steps, which requires significant manual intervention. To address this, we introduce the \textbf{P}rompt \textbf{A}uto-\textbf{E}diting (PAE) method. Besides refining the original prompts for image generation, we further employ an online reinforcement learning strategy to explore the weights and injection time steps of each word, leading to the dynamic fine-control prompts. The reward function during training encourages the model to consider aesthetic score, semantic consistency, and user preferences. Experimental results demonstrate that our proposed method effectively improves the original prompts, generating visually more appealing images while maintaining semantic alignment. Code is available at https://github.com/Mowenyii/PAE.


DiffChat: Learning to Chat with Text-to-Image Synthesis Models for Interactive Image Creation

Wang, Jiapeng, Wang, Chengyu, Cao, Tingfeng, Huang, Jun, Jin, Lianwen

arXiv.org Artificial Intelligence

We present DiffChat, a novel method to align Large Language Models (LLMs) to "chat" with prompt-as-input Text-to-Image Synthesis (TIS) models (e.g., Stable Diffusion) for interactive image creation. Given a raw prompt/image and a user-specified instruction, DiffChat can effectively make appropriate modifications and generate the target prompt, which can be leveraged to create the target image of high quality. To achieve this, we first collect an instruction-following prompt engineering dataset named InstructPE for the supervised training of DiffChat. Next, we propose a reinforcement learning framework with the feedback of three core criteria for image creation, i.e., aesthetics, user preference, and content integrity. It involves an action-space dynamic modification technique to obtain more relevant positive samples and harder negative samples during the off-policy sampling. Content integrity is also introduced into the value estimation function for further improvement of produced images. Our method can exhibit superior performance than baseline models and strong competitors based on both automatic and human evaluations, which fully demonstrates its effectiveness.


Is AI Image Generation Art?. If you're debating whether AI image…

#artificialintelligence

If you're debating whether AI image generation is art or not, this blog post is for you. Learn about the different ways you can generate your own AI art. For centuries, art has been seen as a product of the human imagination, a way for us to express our creativity and view of the world around us. But what happens when artificial intelligence (AI) is used to generate images? Some argue that generated images cannot be classified as art because they lack the creativity and human emotion that traditional artwork has.


We Need To Think Bigger About AI And Art - AI Summary

#artificialintelligence

From abstract art, digital painting, complex sculpture, architectural visualisation or 5 years old hand drawing, whatever you ask for, the AI makes it, or at least tries its best to. Does that cheapen the value of someone who puts their hands in researching, planning, sketching, developing their artworks; versus an AI just learning that from having studied millions of existing images and figuring out what we humans like to look at, and replicating that in mere minutes? You can create artwork without specifying colour or even form of your subject, and simply allowing the AI to take the reins, while you only provide inspirations and mood you wanted to achieve. It's not perfect though, as you will quickly notice; it can not, at the moment at least, understand nuance interactions between human culture, and everything that is presented in the resulting image is based on learning from what people created. Allowing it to run without supervision will in turn create a visual language in our society that might encourage bigotry and biases, and limit true creativity that inspire progression through diversity.